Random Networks Growing Under a Diameter Constraint 
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We study the growth of random networks under a constraint that the diameter, defined as the 
average shortest path length between all nodes, remains approximately constant. We show that if the 
graph maintains the form of its degree distribution then that distribution must be approximately 
scale-free with an exponent between 2 and 3. The diameter constraint can be interpreted as an 
environmental selection pressure that may help explain the scale-free nature of graphs for which 
data is available at different times in their growth. Two examples include graphs representing 
evolved biological pathways in cells and the topology of the Internet backbone. Our assumptions 
and explanation are found to be consistent with these data. 
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Measurements on a wide variety of networks such _as 
the World Wide Web [10, the Internet backbone 
social networks 0, El ujjT an d metabolic networks \ 
have shown that they differ significantly from the clas- 
sic Erdos-Renyi model of random graphs [lfj . While the 
traditional Erdos-Renyi model has a Poisson node de- 
gree distribution, with most nodes having a characteris- 
tic number of links, these networks have highly skewed, 
scale-free degree distributions approximately following a 
power law p(k) ~ k 1 , where k is the node degree, and 7 
is the scale-free exponent. To account for these observa- 
tions, random grap h growth models have been developed 
[111 ll2L lla \l4 . Il5| that rely on the intuitively appealing 
idea of preferential attachment. 

For the most part, these models are of growing net- 
works in which new nodes are added to a graph by ad- 
dition of one or more edges to already existing nodes. 
The attachment is preferential because the likelihood of 
attachment to a node depends on the number of other 
nodes already linked to it. Thus, such models rely solely 
on endogenous factors, since they do not take account 
of any global exogenous selection pressures which might 
shape the form of evolving and growing networks. 

Such selection pressures would be especially relevant, 
for example, in a biological context. Measurements 
of the topological properties of graphs representing the 
metabolic networks of 43 organisms have demonstrated 
their scale-free nature 0, llol ] In all of these graphs, of 
different sizes, it was found that the diameter, defined 
as the average shortest path between every pair of nodes 
in the graph, was constant. Such a constancy may be 
related to important properties relevant to the core func- 
tioning of these biological networks, such as the spread 
and speed of responses to perturbations 0, . 

Another example is the Internet backbone graph, 
whose growth and evolution over time has been studied 
by several authors 0,0,0]. In this case, there are per- 
formance and robustness constraints that such networks 
much satisfy. These constraints can be thought of as 
environmental pressures (which may operate indirectly) 
that would select against highly inefficient network struc- 
tures. One possibility may be a bias in favor of network 



changes and additions that tend to maintain the average 
shortest path. 

These two cases are relatively rare examples of network 
data representing graphs of varying size shaped by similar 
selection pressures, and they allow the testing of explana- 
tions for theirgeneric features. The main selection-based 
explanation ja] for scale-free network topologies relies on 
the fact that such networks are robust with respect to 
random malfunction of nodes 0]. Robustness is iden- 
tified with the diameter of the network, and scale-free 
networks maintain their diameter when nodes are elim- 
inated at random. However, while scale-free graphs are 
robust in this sense, it has not been shown that robust 
graphs must necessarily be scale-free. 

In this Letter, we argue that another notion of ro- 
bustness can be added to the error tolerance argument. 
Namely, scale-free networks are special in the sense that 
they can grow, with the same functional form for the 
degree distribution, and simultaneously maintain an ap- 
proximately constant diameter. This implies that when 
graphs grow and evolve in an environment in which there 
are selection pressures on the diameter, these graphs are 
likely to be scale- free. Our results help clarify the connec- 
tion between the apparent constancy of the diameter and 
the scale-free topology of the graphs in the two examples 
studied. 

We consider random graphs of different sizes under the 
constraint that the diameter remain approximately con- 
stant as the graph grows. We show that if the graph 
maintains the form of its degree distribution then that 
distribution must be approximately scale- free with an ex- 
ponent between 2 and 3. These results may help explain 
the scale- free nature of graphs, of varying sizes, represent- 
ing the evolved metabolic pathways of different organisms 
and the topology of the Internet backbone. Our assump- 
tions and results are consistent with empirical findings. 

We start with an expression for the diameter d of a 
random graph with arbitrary degree distribution devel- 
oped in Ref. [13] using a generating function formalism 
applied to the degree distribution p (k) , 
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(1) 



This formula depends only on the number of nodes in 
the graph N, the average degree of the nodes z\, and 
the average number of nodes z 2 that are reachable in 
two edge traversals from an arbitrary node. The formula 
was derived by considering an ensemble of random graphs 
(without explicit clustering, see Refs. H3)0| for details), 
and calculating from the degree distribution the average 
number of nodes within some radius of edges away from 
a random node. As the radius gets larger, the number 
of nodes enclosed grows rapidly. When that number of 
nodes is approximately equal to N, the corresponding 
radius is a good approximation to the average shortest 
path in the graph, since most of the nodes are at that 
distance. If the random graph is directed, the same ar- 
gument applies, and Eq. Q is valid when p (k) is the 
degree distribution of outgoing edges. 

It is important to emphasize that Eq. is an approx- 
imate formula for the average shortest path of a random 
graph (without clustering or assortative mixing, etc.). 
While the formula is not numerically precise^t does cap- 
ture the dependence of the diameter on N 30] . 

To calculate the diameter of a graph with degree distri- 
bution p (fc), but with a finite number of nodes N, the size 
of the graph must be parameterized through the degree 
distribution. In any fully connected finite-sized graph, 
the smallest possible degree is 1 and the largest degree 
in the graph must be less than N. The parameterization 
can be accomplished by imposing an A-depcndant cutoff 
function k c (N) < N and writing (k' 1 ) = ^JcW f^p^y 

Additionally, a simple application of the generating 
function method leads to the relationship z 2 — (fc 2 ) — (k) 
((fc) = z\ by definition), which allows the diameter to be 
calculated from the first two moments of p{k). 

We seek a degree distribution that maintains its func- 
tional form and has an approximately constant diameter 
independent of N. Effectively, this means a single func- 
tion p (k) that does not depend on k c except through its 
normalization. (We rule out the most obvious example, 
a star-shaped graph with TV — 1 nodes each connected 
to the iVth node. This graph, whose construction is de- 
terministic, has an approximately constant diameter of 

Returning to Eq. 0J, consider the first and second 
terms. The first plus the third term are an upper bound 
on d since the second term is strictly non-negative and 
always < 1. In addition, for any fixed p(k), the second 
term is non-increasing as k c gets larger, and will approach 
a constant < 1 if p (k) has finite moments as N — > oo. As 
a result, for our purposes the first term is dominant, and 
we will neglect the second. If the first term is a constant 
c, then the diameter will always lie between c and c + 1 . 



Thus, in order to find the degree distribution with ap- 
proximately constant diameter independent of N, we set 
the first term in Eq. (JJJ be equal to a constant c, which 
results in the requirement that 



TV 1/C - Z 2 /Z! 



(2) 



The ratio z 2 jz\ is also the average degree of a node found 
by following a randomly chosen edge jsjfj. This average 
degree must always be less than the largest degree in 
the graph, which provides a lower bound on the cutoff 
function k c (TV), resulting in N 1 ^ < k c (N) < N for 
all N. The function k c (TV) is therefore bounded from 
above and below by power functions and we will use the 
explicit form k c (N) — N a where 1/c < a < 1. Setting 
a = 1, we recover the least restrictive case where no 
node has a degree greater the size of the graph. Letting 
a vary also allows us to see the consequences of different 
cutoff dependencies if they are consistent with empirical 
findings. 

Then, using again the assumption N » 1, Eq. (0) can 
be written as an equation for p (k) involving the ratio of 
its moments: 
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The distribution p(k) can be determined by writing this 
equation for k c and fc c ±l. Algebraic manipulation yields 
the recurrence relation 



p(k+l) =p{k) 
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where (3 = 1/ac for convenience. The full solution can 
easily be calculated numerically. For most values of f3 the 
distribution approaches its scale-free asymptote rather 
quickly (by k = 100), as shown in Fig. ^ 

A more explicit analytic characterization of the solu- 
tion can be found by using an integral approximation for 

— f" 1 k 2 p(k)dk 

Eq. MJ, kc c = ^jnr c This integral equation can be 

J c kp(k)dk 

solved exactly by turning it into a first-order differential 
equation for the function p (•) by differentiations with re- 
spect to k c . Replacing k c with a dummy variable k, and 
writing f3 — 1/ac for convenience, the resulting equation 
is (03 - 3)fc + (fi + 2)kP)p (k) + k{kP - k)p' (k) = 0. The 
unique solution, up to a normalization constant depen- 
dent on k c , is 



p(k) = 



k 3-0 



(1 



(5) 



The solution is in good agreement with the result of nu- 
merical iteration of the exact result of Eq. away from 
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FIG. 1: A plot of the numerically calculated distributions for 
various values of (3 and with a k ma x = 10 4 . 
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k = 1. The singular behavior for /3<l/2atfc = lis 
the result of the continuous approximation. Nevertheless, 
Eq. @ well approximates the form of the distribution 
with an approximately constant diameter that is always 
between c and c + 1, approaching c + 1 asymptotically 
with N. For k 3> 1, it is also a scale-free distribution 
with exponent 7 = 3 — 1/ac. Since a > 1/c and c > 1 
is a natural bound on c, the scale-free exponent satisfies 
2 < 7 < 3, and it can be written in terms of the diameter 
d and the cutoff exponent a as 7 = 3 — a ^_ 1 ^ l ■ 

These results are valid for the model of random graphs 
constructed according to the simple method assumed in 
Ref. |2£j. Even for these ideal graphs, Eq. numer- 
ically underestimates the actual path lengths, as is ob- 
vious from its method of derivation, but it does capture 
the proper scaling behavior. Real-world graphs with ad- 
ditional non-random structure can have even longer aver- 
age shortest paths. Nevertheless, as shown in |2l|, |22j, |23j , 
the formula does approximate the diameter for many 
real-world graphs, especially with respect to the depen- 
dence of the average path length on N. In our experi- 
ments on two diverse real-world data sets, we also find the 
numerical underestimation. Still, a comparison between 
the behavior of Eq. (yi and measured diameters on our 
data sets for all N shows that this analytical treatment 
tracks the basic features of the data quite well (see Figs. 
andEk). 

We next consider our assumptions and results in the 
context of two data sets, each representing snapshots of 
the topologies of graphs that grow in an environment 
with possible selection pressures. The first concerns the 
organization of essential biological processes within the 
cell for a variety of simple organisms, and the second con- 
cerns the large-scale structure of the Internet backbone. 
While these are, of course, very different examples, we 
will show that they do share features consistent with the 
arguments just presented, and we provide reasons for why 



FIG. 2: (a) The diameter d is almost constant as a function 
of N, the size of the metabolic network. The top set of points 
are the measured average shortest path on the graphs, while 
the bottom set shows the quantity computed from the degree 
distributions using Eq. Q. (b) The maximum in degree and 
out degree of the node scales linearly with the size of the 
network. 



a selection argument related to the diameter of these net- 
works is reasonable. 

Metabolic pathways are complex biological networks 
in which a series of enzymatic reactions produce specific 
products within cells. Large scale sequencing projects 
have furnished integrated pathway-genome databases [3 
[25L l2fil | from which organism-specific metabolic networks 
can be inferred. In a metabolic network, nodes represent 
the substrates, and a directed link connects the educt to 
the product of a metabolic reaction. 

Recently, such databases have been used H, to ana- 
lyze the topological properties of the metabolic networks 
of 43 different organisms including E-coli (bacterium) 
and Caenorhibditis elegans (eukaryote). The network de- 
gree distributions were found to be uniformly scale-free 
with exponents between 2.0 and 2.4. A striking feature 
of the metabolic networks studied is that even though 
their sizes vary between 200 and 800 nodes, the diameter 
stays approximately fixed between 3.0 and 3.5, as shown 
in Fig. 0}. 

It has been speculated that metabolic networks 

may have evolved to maintain a constant diameter in 
order to minimize the number of sequential reactions 
necessary to obtain a particular product. For exam- 
ple, it was found that there are several possible path- 
ways which could provide the same chemical solution as 
the Krebs cycle, but the true Krebs cycle is the most 
efficient and contains the least number of steps (2?| • An- 
other possible evolutionary force is opportunism, where a 
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FIG. 3: (a) Evolution of the Internet backbone diameter with 
the number of nodes recorded on NLANR Internet maps. The 
top curve is the measured average shortest path on the graphs, 
while the bottom panel shows the quantity computed from the 
degree distributions using Eq. Q. Days where the mapping 
was incomplete are omitted, (b) The maximum degree of a 
node scales linearly with the size of the network. 



new metabolic pathway is developed by re-using enzyme 
catalyzing reactions already in the cell rather than devel- 
oping an entirely new pathway from scratch 9J. Thus, as 
the network evolves, existing substrates are incorporated 
into new pathways and their connectivity grows. Fig. |2h 
shows that the degree of the most connected node grows 
linearly with the size of the metabolic network, in agree- 
ment with the assumption k c ~ N a . 

Further potential benefits of small diameters include 
the reduction in the transition time between metabolic 
states [(J in response to environmental changes. Net- 
works with robustly small average path lengths have been 
found to rapidly adjust to perturbations 7]. Thus there 
may be a selective advantage to maintaining a small di- 
ameter. 

The Internet backbone is a second example of a net- 
work where maintaining a constant diameter is impor- 
tant. Data is routed on the Internet between tens of 
millions of host computers by breaking the data up into 
packets, each of which are routed individually and then 
re-assembled upon arrival at their destination. The pack- 
ets hop from node to node in the network. Each addi- 
tional hop a packet must make introduces latency and 
increases the potential for signal degradation through er- 
rors and delays. We measured the diameter of Internet 
maps from November 1997 to January 2000 gathered by 
the National Laboratory for Applied Network Research 
(NLANR) (http://moat.lanr.net). Each node is an au- 
tonomous system (AS) usually corresponding to a single 



FIG. 4: An example degree distribution for the Internet back- 
bone compared with a distribution computed using the recur- 
sion in Eq. gj. 



Internet Service Provider, and the links represent inter- 
ISP connections. The Internet backbone connectivity 
distribution is power-law with an exponent 7 = 2.2 ±0.1 
invariant over time 0, Il8j . An example degree distri- 
bution is shown in Fig. 0] along with a distribution cal- 
culated using the recursion in Eq. Q. Fig. [3^ shows, 
consistent with previous measurements [l8| . that the di- 
ameter stays approximately constant at 3.7 hops over the 
2 year period, while the number of nodes doubles from 
three to six thousand. It appears that the Internet back- 
bone may have evolved to connect a greater number of 
ISPs but has kept the average number of hops Internet 
traffic must make low. 

In summary, we have presented a plausible reason for 
the existence of scale-free distributions observed in two 
contexts, metabolic networks and the Internet backbone, 
where there are evolutionary pressures to maintain a 
small diameter. Our analysis shows that for a robust 
network to maintain its diameter, the form of its degree 
distribution should be scale-free. We have further shown 
our assumptions to be consistent with observed features 
in the two data sets. Combined with endogenous mod- 
els of preferential attachment, and the error tolerance 
of scale-free networks, our results help further explain 
the prevalence of scale-free networks in selective environ- 
ments. 

We acknowledge B. A. Huberman for many useful dis- 
cussions and A. R. Puniyani for contributions to earlier 
versions of this work. 
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